<!DOCTYPE html>

<html>

  <head>
    <title>Robotic Manipulation: Reinforcement Learning</title>
    <meta name="Robotic Manipulation: Reinforcement Learning" content="text/html; charset=utf-8;" />
    <link rel="canonical" href="http://manipulation.csail.mit.edu/rl.html" />

    <script src="https://hypothes.is/embed.js" async></script>
    <script type="text/javascript" src="htmlbook/book.js"></script>

    <script src="htmlbook/mathjax-config.js" defer></script> 
    <script type="text/javascript" id="MathJax-script" defer
      src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
    </script>
    <script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>

    <link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
    <script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
    <script>hljs.initHighlightingOnLoad();</script>

    <link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
  </head>

<body onload="loadChapter('manipulation');">

<div data-type="titlepage">
  <header>
    <h1><a href="index.html" style="text-decoration:none;">Robotic Manipulation</a></h1>
    <p data-type="subtitle">Perception, Planning, and Control</p> 
    <p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
    <p style="font-size: 14px; text-align: right;"> 
      &copy; Russ Tedrake, 2020<br/>
      Last modified <span id="last_modified"></span>.</br>
      <script>
      var d = new Date(document.lastModified);
      document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
      <a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
    </p>
  </header>
</div>

<p><b>Note:</b> These are working notes used for <a
href="http://manipulation.csail.mit.edu/Fall2020/">a course being taught
at MIT</a>. They will be updated throughout the Fall 2020 semester.  <!-- <a 
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture  videos are available on YouTube</a>.--></p> 

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=trajectories.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=drake.html>Next Chapter</a></td>
</tr></table>


<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 8"><h1>Reinforcement Learning</h1>
  <div style="clear:right;"></div>

  <p>This is a placeholder chapter. Content to be filled in.</p>

  <section><h1>Exercises</h1>
    <exercise><h1>Stochastic Optimization</h1>
      <p> For this exercise, you will implement a stochastic optimization scheme that does not require exact analytical gradients. You will work exclusively in <a href="https://colab.research.google.com/github/RussTedrake/manipulation/blob/master/exercises/rl/stochastic_optimization.ipynb" target="_blank">this notebook</a>. You will be asked to complete the following steps: </p>
      <ol type="a">
        <li> Implement gradient descent with exact analytical gradients.
        </li>
        <li> Implement stochastic gradient descent with approximated gradients. </li>
        <li> Prove that the expected value of the stochastic update does not change with baselines.</li>
        <li> Implement stochastic gradient descent with baselines. 
        </li>
      </ol>
    </exercise>    

    <exercise><h1>REINFORCE</h1>
      <p> For this exercise, you will implement the vanilla REINFORCE algorithm on a box pushing task. You will work exclusively in <a href="https://colab.research.google.com/github/RussTedrake/manipulation/blob/master/exercises/rl/policy_gradient.ipynb" target="_blank">this notebook</a>. You will be asked to complete the following steps: </p>
      <ol type="a">
        <li> Implement the policy loss function. </li>
        <li> Implement the value loss function. </li>
        <li> Implement the advantage function. </li>
      </ol>
    </exercise>
  </section>

</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=trajectories.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=drake.html>Next Chapter</a></td>
</tr></table>

<div id="footer">
  <hr>
  <table style="width:100%;">
    <tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">&copy; Russ
      Tedrake, 2020</td></tr>
  </table>
</div>


</body>
</html>
